Hi I'm Abdullah¶

Covid_19 Symptoms Using for Result Machine Learning & 98%¶

*Import Libraries¶

In [2]:
pip install dataprep
Collecting dataprep
  Using cached dataprep-0.4.5-py3-none-any.whl (9.9 MB)
Collecting wordcloud<2.0,>=1.8
  Using cached wordcloud-1.8.2.2-cp39-cp39-win_amd64.whl (153 kB)Note: you may need to restart the kernel to use updated packages.
ERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
anaconda-project 0.10.2 requires ruamel-yaml, which is not installed.
jupyter-server 1.13.5 requires pywinpty<2; os_name == "nt", but you have pywinpty 2.0.2 which is incompatible.
distributed 2022.2.1 requires dask==2022.02.1, but you have dask 2022.11.1 which is incompatible.
Requirement already satisfied: pandas<2.0,>=1.1 in c:\users\gc\anaconda3\lib\site-packages (from dataprep) (1.4.2)
Collecting rapidfuzz<3.0.0,>=2.1.2
  Using cached rapidfuzz-2.13.2-cp39-cp39-win_amd64.whl (1.0 MB)
Requirement already satisfied: nltk<4.0.0,>=3.6.7 in c:\users\gc\anaconda3\lib\site-packages (from dataprep) (3.7)
Requirement already satisfied: aiohttp<4.0,>=3.6 in c:\users\gc\anaconda3\lib\site-packages (from dataprep) (3.8.1)
Collecting jsonpath-ng<2.0,>=1.5
  Using cached jsonpath_ng-1.5.3-py3-none-any.whl (29 kB)
Collecting python-stdnum<2.0,>=1.16
  Using cached python_stdnum-1.18-py2.py3-none-any.whl (1.0 MB)
Collecting regex<2022.0.0,>=2021.8.3
  Using cached regex-2021.11.10-cp39-cp39-win_amd64.whl (273 kB)
Collecting scipy<2.0,>=1.8
  Using cached scipy-1.9.3-cp39-cp39-win_amd64.whl (40.2 MB)
Collecting flask_cors<4.0.0,>=3.0.10
  Using cached Flask_Cors-3.0.10-py2.py3-none-any.whl (14 kB)
Collecting varname<0.9.0,>=0.8.1
  Using cached varname-0.8.3-py3-none-any.whl (21 kB)
Collecting metaphone<0.7,>=0.6
  Using cached Metaphone-0.6-py3-none-any.whl
Collecting flask<3,>=2
  Using cached Flask-2.2.2-py3-none-any.whl (101 kB)
Requirement already satisfied: ipywidgets<8.0,>=7.5 in c:\users\gc\anaconda3\lib\site-packages (from dataprep) (7.6.5)
Requirement already satisfied: tqdm<5.0,>=4.48 in c:\users\gc\anaconda3\lib\site-packages (from dataprep) (4.64.0)
Collecting dask[array,dataframe,delayed]>=2022.3.0
  Using cached dask-2022.11.1-py3-none-any.whl (1.1 MB)
Collecting python-crfsuite==0.9.8
  Using cached python_crfsuite-0.9.8-cp39-cp39-win_amd64.whl (158 kB)
Collecting pydot<2.0.0,>=1.4.2
  Using cached pydot-1.4.2-py2.py3-none-any.whl (21 kB)
Collecting sqlalchemy==1.3.24
  Using cached SQLAlchemy-1.3.24-cp39-cp39-win_amd64.whl (1.2 MB)
Requirement already satisfied: bokeh<3,>=2 in c:\users\gc\anaconda3\lib\site-packages (from dataprep) (2.4.2)
Collecting jinja2<3.1,>=3.0
  Using cached Jinja2-3.0.3-py3-none-any.whl (133 kB)
Requirement already satisfied: numpy<2.0,>=1.21 in c:\users\gc\anaconda3\lib\site-packages (from dataprep) (1.21.5)
Collecting pydantic<2.0,>=1.6
  Using cached pydantic-1.10.2-cp39-cp39-win_amd64.whl (2.1 MB)
Requirement already satisfied: async-timeout<5.0,>=4.0.0a3 in c:\users\gc\anaconda3\lib\site-packages (from aiohttp<4.0,>=3.6->dataprep) (4.0.1)
Requirement already satisfied: aiosignal>=1.1.2 in c:\users\gc\anaconda3\lib\site-packages (from aiohttp<4.0,>=3.6->dataprep) (1.2.0)
Requirement already satisfied: frozenlist>=1.1.1 in c:\users\gc\anaconda3\lib\site-packages (from aiohttp<4.0,>=3.6->dataprep) (1.2.0)
Requirement already satisfied: yarl<2.0,>=1.0 in c:\users\gc\anaconda3\lib\site-packages (from aiohttp<4.0,>=3.6->dataprep) (1.6.3)
Requirement already satisfied: attrs>=17.3.0 in c:\users\gc\anaconda3\lib\site-packages (from aiohttp<4.0,>=3.6->dataprep) (21.4.0)
Requirement already satisfied: charset-normalizer<3.0,>=2.0 in c:\users\gc\anaconda3\lib\site-packages (from aiohttp<4.0,>=3.6->dataprep) (2.0.4)
Requirement already satisfied: multidict<7.0,>=4.5 in c:\users\gc\anaconda3\lib\site-packages (from aiohttp<4.0,>=3.6->dataprep) (5.1.0)
Requirement already satisfied: typing-extensions>=3.6.5 in c:\users\gc\anaconda3\lib\site-packages (from async-timeout<5.0,>=4.0.0a3->aiohttp<4.0,>=3.6->dataprep) (4.1.1)
Requirement already satisfied: tornado>=5.1 in c:\users\gc\anaconda3\lib\site-packages (from bokeh<3,>=2->dataprep) (6.1)
Requirement already satisfied: pillow>=7.1.0 in c:\users\gc\anaconda3\lib\site-packages (from bokeh<3,>=2->dataprep) (9.0.1)
Requirement already satisfied: packaging>=16.8 in c:\users\gc\anaconda3\lib\site-packages (from bokeh<3,>=2->dataprep) (21.3)
Requirement already satisfied: PyYAML>=3.10 in c:\users\gc\anaconda3\lib\site-packages (from bokeh<3,>=2->dataprep) (6.0)
Requirement already satisfied: toolz>=0.8.2 in c:\users\gc\anaconda3\lib\site-packages (from dask[array,dataframe,delayed]>=2022.3.0->dataprep) (0.11.2)
Requirement already satisfied: partd>=0.3.10 in c:\users\gc\anaconda3\lib\site-packages (from dask[array,dataframe,delayed]>=2022.3.0->dataprep) (1.2.0)
Requirement already satisfied: click>=7.0 in c:\users\gc\anaconda3\lib\site-packages (from dask[array,dataframe,delayed]>=2022.3.0->dataprep) (8.0.4)
Requirement already satisfied: fsspec>=0.6.0 in c:\users\gc\anaconda3\lib\site-packages (from dask[array,dataframe,delayed]>=2022.3.0->dataprep) (2022.2.0)
Requirement already satisfied: cloudpickle>=1.1.1 in c:\users\gc\anaconda3\lib\site-packages (from dask[array,dataframe,delayed]>=2022.3.0->dataprep) (2.0.0)
Requirement already satisfied: colorama in c:\users\gc\anaconda3\lib\site-packages (from click>=7.0->dask[array,dataframe,delayed]>=2022.3.0->dataprep) (0.4.4)
Requirement already satisfied: importlib-metadata>=3.6.0 in c:\users\gc\anaconda3\lib\site-packages (from flask<3,>=2->dataprep) (4.11.3)
Requirement already satisfied: itsdangerous>=2.0 in c:\users\gc\anaconda3\lib\site-packages (from flask<3,>=2->dataprep) (2.0.1)
Collecting Werkzeug>=2.2.2
  Using cached Werkzeug-2.2.2-py3-none-any.whl (232 kB)
Requirement already satisfied: Six in c:\users\gc\anaconda3\lib\site-packages (from flask_cors<4.0.0,>=3.0.10->dataprep) (1.16.0)
Requirement already satisfied: zipp>=0.5 in c:\users\gc\anaconda3\lib\site-packages (from importlib-metadata>=3.6.0->flask<3,>=2->dataprep) (3.7.0)
Requirement already satisfied: traitlets>=4.3.1 in c:\users\gc\anaconda3\lib\site-packages (from ipywidgets<8.0,>=7.5->dataprep) (5.1.1)
Requirement already satisfied: nbformat>=4.2.0 in c:\users\gc\anaconda3\lib\site-packages (from ipywidgets<8.0,>=7.5->dataprep) (5.3.0)
Requirement already satisfied: ipython>=4.0.0 in c:\users\gc\anaconda3\lib\site-packages (from ipywidgets<8.0,>=7.5->dataprep) (8.2.0)
Requirement already satisfied: widgetsnbextension~=3.5.0 in c:\users\gc\anaconda3\lib\site-packages (from ipywidgets<8.0,>=7.5->dataprep) (3.5.2)
Requirement already satisfied: ipython-genutils~=0.2.0 in c:\users\gc\anaconda3\lib\site-packages (from ipywidgets<8.0,>=7.5->dataprep) (0.2.0)
Requirement already satisfied: ipykernel>=4.5.1 in c:\users\gc\anaconda3\lib\site-packages (from ipywidgets<8.0,>=7.5->dataprep) (6.9.1)
Requirement already satisfied: jupyterlab-widgets>=1.0.0 in c:\users\gc\anaconda3\lib\site-packages (from ipywidgets<8.0,>=7.5->dataprep) (1.0.0)
Requirement already satisfied: nest-asyncio in c:\users\gc\anaconda3\lib\site-packages (from ipykernel>=4.5.1->ipywidgets<8.0,>=7.5->dataprep) (1.5.5)
Requirement already satisfied: debugpy<2.0,>=1.0.0 in c:\users\gc\anaconda3\lib\site-packages (from ipykernel>=4.5.1->ipywidgets<8.0,>=7.5->dataprep) (1.5.1)
Requirement already satisfied: matplotlib-inline<0.2.0,>=0.1.0 in c:\users\gc\anaconda3\lib\site-packages (from ipykernel>=4.5.1->ipywidgets<8.0,>=7.5->dataprep) (0.1.2)
Requirement already satisfied: jupyter-client<8.0 in c:\users\gc\anaconda3\lib\site-packages (from ipykernel>=4.5.1->ipywidgets<8.0,>=7.5->dataprep) (6.1.12)
Requirement already satisfied: decorator in c:\users\gc\anaconda3\lib\site-packages (from ipython>=4.0.0->ipywidgets<8.0,>=7.5->dataprep) (5.1.1)
Requirement already satisfied: pygments>=2.4.0 in c:\users\gc\anaconda3\lib\site-packages (from ipython>=4.0.0->ipywidgets<8.0,>=7.5->dataprep) (2.11.2)
Requirement already satisfied: prompt-toolkit!=3.0.0,!=3.0.1,<3.1.0,>=2.0.0 in c:\users\gc\anaconda3\lib\site-packages (from ipython>=4.0.0->ipywidgets<8.0,>=7.5->dataprep) (3.0.20)
Requirement already satisfied: jedi>=0.16 in c:\users\gc\anaconda3\lib\site-packages (from ipython>=4.0.0->ipywidgets<8.0,>=7.5->dataprep) (0.18.1)
Requirement already satisfied: pickleshare in c:\users\gc\anaconda3\lib\site-packages (from ipython>=4.0.0->ipywidgets<8.0,>=7.5->dataprep) (0.7.5)
Requirement already satisfied: stack-data in c:\users\gc\anaconda3\lib\site-packages (from ipython>=4.0.0->ipywidgets<8.0,>=7.5->dataprep) (0.2.0)
Requirement already satisfied: setuptools>=18.5 in c:\users\gc\anaconda3\lib\site-packages (from ipython>=4.0.0->ipywidgets<8.0,>=7.5->dataprep) (61.2.0)
Requirement already satisfied: backcall in c:\users\gc\anaconda3\lib\site-packages (from ipython>=4.0.0->ipywidgets<8.0,>=7.5->dataprep) (0.2.0)
Requirement already satisfied: parso<0.9.0,>=0.8.0 in c:\users\gc\anaconda3\lib\site-packages (from jedi>=0.16->ipython>=4.0.0->ipywidgets<8.0,>=7.5->dataprep) (0.8.3)
Requirement already satisfied: MarkupSafe>=2.0 in c:\users\gc\anaconda3\lib\site-packages (from jinja2<3.1,>=3.0->dataprep) (2.1.1)
Collecting ply
  Using cached ply-3.11-py2.py3-none-any.whl (49 kB)
Requirement already satisfied: python-dateutil>=2.1 in c:\users\gc\anaconda3\lib\site-packages (from jupyter-client<8.0->ipykernel>=4.5.1->ipywidgets<8.0,>=7.5->dataprep) (2.8.2)
Requirement already satisfied: jupyter-core>=4.6.0 in c:\users\gc\anaconda3\lib\site-packages (from jupyter-client<8.0->ipykernel>=4.5.1->ipywidgets<8.0,>=7.5->dataprep) (4.9.2)
Requirement already satisfied: pyzmq>=13 in c:\users\gc\anaconda3\lib\site-packages (from jupyter-client<8.0->ipykernel>=4.5.1->ipywidgets<8.0,>=7.5->dataprep) (22.3.0)
Requirement already satisfied: pywin32>=1.0 in c:\users\gc\anaconda3\lib\site-packages (from jupyter-core>=4.6.0->jupyter-client<8.0->ipykernel>=4.5.1->ipywidgets<8.0,>=7.5->dataprep) (302)
Requirement already satisfied: jsonschema>=2.6 in c:\users\gc\anaconda3\lib\site-packages (from nbformat>=4.2.0->ipywidgets<8.0,>=7.5->dataprep) (4.4.0)
Requirement already satisfied: fastjsonschema in c:\users\gc\anaconda3\lib\site-packages (from nbformat>=4.2.0->ipywidgets<8.0,>=7.5->dataprep) (2.15.1)
Requirement already satisfied: pyrsistent!=0.17.0,!=0.17.1,!=0.17.2,>=0.14.0 in c:\users\gc\anaconda3\lib\site-packages (from jsonschema>=2.6->nbformat>=4.2.0->ipywidgets<8.0,>=7.5->dataprep) (0.18.0)
Requirement already satisfied: joblib in c:\users\gc\anaconda3\lib\site-packages (from nltk<4.0.0,>=3.6.7->dataprep) (1.1.0)
Requirement already satisfied: pyparsing!=3.0.5,>=2.0.2 in c:\users\gc\anaconda3\lib\site-packages (from packaging>=16.8->bokeh<3,>=2->dataprep) (3.0.4)
Requirement already satisfied: pytz>=2020.1 in c:\users\gc\anaconda3\lib\site-packages (from pandas<2.0,>=1.1->dataprep) (2021.3)
Requirement already satisfied: locket in c:\users\gc\anaconda3\lib\site-packages (from partd>=0.3.10->dask[array,dataframe,delayed]>=2022.3.0->dataprep) (0.2.1)
Requirement already satisfied: wcwidth in c:\users\gc\anaconda3\lib\site-packages (from prompt-toolkit!=3.0.0,!=3.0.1,<3.1.0,>=2.0.0->ipython>=4.0.0->ipywidgets<8.0,>=7.5->dataprep) (0.2.5)
Requirement already satisfied: pure_eval<1.0.0 in c:\users\gc\anaconda3\lib\site-packages (from varname<0.9.0,>=0.8.1->dataprep) (0.2.2)
Requirement already satisfied: executing<0.9.0,>=0.8.3 in c:\users\gc\anaconda3\lib\site-packages (from varname<0.9.0,>=0.8.1->dataprep) (0.8.3)
Requirement already satisfied: asttokens<3.0.0,>=2.0.0 in c:\users\gc\anaconda3\lib\site-packages (from varname<0.9.0,>=0.8.1->dataprep) (2.0.5)
Requirement already satisfied: notebook>=4.4.1 in c:\users\gc\anaconda3\lib\site-packages (from widgetsnbextension~=3.5.0->ipywidgets<8.0,>=7.5->dataprep) (6.4.8)
Requirement already satisfied: terminado>=0.8.3 in c:\users\gc\anaconda3\lib\site-packages (from notebook>=4.4.1->widgetsnbextension~=3.5.0->ipywidgets<8.0,>=7.5->dataprep) (0.13.1)
Requirement already satisfied: Send2Trash>=1.8.0 in c:\users\gc\anaconda3\lib\site-packages (from notebook>=4.4.1->widgetsnbextension~=3.5.0->ipywidgets<8.0,>=7.5->dataprep) (1.8.0)
Requirement already satisfied: prometheus-client in c:\users\gc\anaconda3\lib\site-packages (from notebook>=4.4.1->widgetsnbextension~=3.5.0->ipywidgets<8.0,>=7.5->dataprep) (0.13.1)
Requirement already satisfied: argon2-cffi in c:\users\gc\anaconda3\lib\site-packages (from notebook>=4.4.1->widgetsnbextension~=3.5.0->ipywidgets<8.0,>=7.5->dataprep) (21.3.0)
Requirement already satisfied: nbconvert in c:\users\gc\anaconda3\lib\site-packages (from notebook>=4.4.1->widgetsnbextension~=3.5.0->ipywidgets<8.0,>=7.5->dataprep) (6.4.4)
Requirement already satisfied: pywinpty>=1.1.0 in c:\users\gc\anaconda3\lib\site-packages (from terminado>=0.8.3->notebook>=4.4.1->widgetsnbextension~=3.5.0->ipywidgets<8.0,>=7.5->dataprep) (2.0.2)
Requirement already satisfied: matplotlib in c:\users\gc\anaconda3\lib\site-packages (from wordcloud<2.0,>=1.8->dataprep) (3.5.1)
Requirement already satisfied: idna>=2.0 in c:\users\gc\anaconda3\lib\site-packages (from yarl<2.0,>=1.0->aiohttp<4.0,>=3.6->dataprep) (3.3)
Requirement already satisfied: argon2-cffi-bindings in c:\users\gc\anaconda3\lib\site-packages (from argon2-cffi->notebook>=4.4.1->widgetsnbextension~=3.5.0->ipywidgets<8.0,>=7.5->dataprep) (21.2.0)
Requirement already satisfied: cffi>=1.0.1 in c:\users\gc\anaconda3\lib\site-packages (from argon2-cffi-bindings->argon2-cffi->notebook>=4.4.1->widgetsnbextension~=3.5.0->ipywidgets<8.0,>=7.5->dataprep) (1.15.0)
Requirement already satisfied: pycparser in c:\users\gc\anaconda3\lib\site-packages (from cffi>=1.0.1->argon2-cffi-bindings->argon2-cffi->notebook>=4.4.1->widgetsnbextension~=3.5.0->ipywidgets<8.0,>=7.5->dataprep) (2.21)
Requirement already satisfied: kiwisolver>=1.0.1 in c:\users\gc\anaconda3\lib\site-packages (from matplotlib->wordcloud<2.0,>=1.8->dataprep) (1.3.2)
Requirement already satisfied: fonttools>=4.22.0 in c:\users\gc\anaconda3\lib\site-packages (from matplotlib->wordcloud<2.0,>=1.8->dataprep) (4.25.0)
Requirement already satisfied: cycler>=0.10 in c:\users\gc\anaconda3\lib\site-packages (from matplotlib->wordcloud<2.0,>=1.8->dataprep) (0.11.0)
Requirement already satisfied: nbclient<0.6.0,>=0.5.0 in c:\users\gc\anaconda3\lib\site-packages (from nbconvert->notebook>=4.4.1->widgetsnbextension~=3.5.0->ipywidgets<8.0,>=7.5->dataprep) (0.5.13)
Requirement already satisfied: pandocfilters>=1.4.1 in c:\users\gc\anaconda3\lib\site-packages (from nbconvert->notebook>=4.4.1->widgetsnbextension~=3.5.0->ipywidgets<8.0,>=7.5->dataprep) (1.5.0)
Requirement already satisfied: mistune<2,>=0.8.1 in c:\users\gc\anaconda3\lib\site-packages (from nbconvert->notebook>=4.4.1->widgetsnbextension~=3.5.0->ipywidgets<8.0,>=7.5->dataprep) (0.8.4)
Requirement already satisfied: bleach in c:\users\gc\anaconda3\lib\site-packages (from nbconvert->notebook>=4.4.1->widgetsnbextension~=3.5.0->ipywidgets<8.0,>=7.5->dataprep) (4.1.0)
Requirement already satisfied: entrypoints>=0.2.2 in c:\users\gc\anaconda3\lib\site-packages (from nbconvert->notebook>=4.4.1->widgetsnbextension~=3.5.0->ipywidgets<8.0,>=7.5->dataprep) (0.4)
Requirement already satisfied: defusedxml in c:\users\gc\anaconda3\lib\site-packages (from nbconvert->notebook>=4.4.1->widgetsnbextension~=3.5.0->ipywidgets<8.0,>=7.5->dataprep) (0.7.1)
Requirement already satisfied: beautifulsoup4 in c:\users\gc\anaconda3\lib\site-packages (from nbconvert->notebook>=4.4.1->widgetsnbextension~=3.5.0->ipywidgets<8.0,>=7.5->dataprep) (4.11.1)
Requirement already satisfied: jupyterlab-pygments in c:\users\gc\anaconda3\lib\site-packages (from nbconvert->notebook>=4.4.1->widgetsnbextension~=3.5.0->ipywidgets<8.0,>=7.5->dataprep) (0.1.2)
Requirement already satisfied: testpath in c:\users\gc\anaconda3\lib\site-packages (from nbconvert->notebook>=4.4.1->widgetsnbextension~=3.5.0->ipywidgets<8.0,>=7.5->dataprep) (0.5.0)
Requirement already satisfied: soupsieve>1.2 in c:\users\gc\anaconda3\lib\site-packages (from beautifulsoup4->nbconvert->notebook>=4.4.1->widgetsnbextension~=3.5.0->ipywidgets<8.0,>=7.5->dataprep) (2.3.1)
Requirement already satisfied: webencodings in c:\users\gc\anaconda3\lib\site-packages (from bleach->nbconvert->notebook>=4.4.1->widgetsnbextension~=3.5.0->ipywidgets<8.0,>=7.5->dataprep) (0.5.1)
Installing collected packages: jinja2, Werkzeug, regex, ply, flask, dask, wordcloud, varname, sqlalchemy, scipy, rapidfuzz, python-stdnum, python-crfsuite, pydot, pydantic, metaphone, jsonpath-ng, flask-cors, dataprep
  Attempting uninstall: jinja2
    Found existing installation: Jinja2 2.11.3
    Uninstalling Jinja2-2.11.3:
      Successfully uninstalled Jinja2-2.11.3
  Attempting uninstall: Werkzeug
    Found existing installation: Werkzeug 2.0.3
    Uninstalling Werkzeug-2.0.3:
      Successfully uninstalled Werkzeug-2.0.3
  Attempting uninstall: regex
    Found existing installation: regex 2022.3.15
    Uninstalling regex-2022.3.15:
      Successfully uninstalled regex-2022.3.15
  Attempting uninstall: flask
    Found existing installation: Flask 1.1.2
    Uninstalling Flask-1.1.2:
      Successfully uninstalled Flask-1.1.2
  Attempting uninstall: dask
    Found existing installation: dask 2022.2.1
    Uninstalling dask-2022.2.1:
      Successfully uninstalled dask-2022.2.1
  Attempting uninstall: sqlalchemy
    Found existing installation: SQLAlchemy 1.4.32
    Uninstalling SQLAlchemy-1.4.32:
      Successfully uninstalled SQLAlchemy-1.4.32
  Attempting uninstall: scipy
    Found existing installation: scipy 1.7.3
    Uninstalling scipy-1.7.3:
      Successfully uninstalled scipy-1.7.3
Successfully installed Werkzeug-2.2.2 dask-2022.11.1 dataprep-0.4.5 flask-2.2.2 flask-cors-3.0.10 jinja2-3.0.3 jsonpath-ng-1.5.3 metaphone-0.6 ply-3.11 pydantic-1.10.2 pydot-1.4.2 python-crfsuite-0.9.8 python-stdnum-1.18 rapidfuzz-2.13.2 regex-2021.11.10 scipy-1.9.3 sqlalchemy-1.3.24 varname-0.8.3 wordcloud-1.8.2.2
In [4]:
import pandas as pd
import numpy as np

# data visualization library 
import matplotlib
import matplotlib.pyplot as plt
import seaborn as sns

sns.set(context='notebook', style='darkgrid', palette='colorblind', font='sans-serif', font_scale=1, rc=None)
matplotlib.rcParams['figure.figsize'] =[6,6]
matplotlib.rcParams.update({'font.size': 15})
matplotlib.rcParams['font.family'] = 'sans-serif'
# dataprep
In [5]:
# dataprep
from dataprep.eda import *
from dataprep.eda.missing import plot_missing
from dataprep.eda import plot_correlation

Data Analysis¶

In [36]:
covid = pd.read_csv('covid 19.csv')
covid
Out[36]:
S.No Fever BodyPain Age RunnyNose DiffBreath InfectionProb
0 1 102 0 9 0 -1 0
1 2 102 0 10 0 0 1
2 3 104 0 33 1 -1 0
3 4 101 1 59 0 1 0
4 5 99 0 98 0 0 0
... ... ... ... ... ... ... ...
2570 2571 99 0 90 0 0 1
2571 2572 100 0 53 0 -1 1
2572 2573 101 0 44 1 0 0
2573 2574 102 0 97 0 -1 1
2574 2575 104 1 62 1 -1 1

2575 rows × 7 columns

In [37]:
covid.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 2575 entries, 0 to 2574
Data columns (total 7 columns):
 #   Column         Non-Null Count  Dtype
---  ------         --------------  -----
 0   S.No           2575 non-null   int64
 1   Fever          2575 non-null   int64
 2   BodyPain       2575 non-null   int64
 3   Age            2575 non-null   int64
 4   RunnyNose      2575 non-null   int64
 5   DiffBreath     2575 non-null   int64
 6   InfectionProb  2575 non-null   int64
dtypes: int64(7)
memory usage: 140.9 KB
In [38]:
covid.describe(include='all')
Out[38]:
S.No Fever BodyPain Age RunnyNose DiffBreath InfectionProb
count 2575.000000 2575.000000 2575.000000 2575.000000 2575.000000 2575.000000 2575.000000
mean 1288.000000 100.969709 0.492816 51.023301 0.502136 0.002330 0.493592
std 743.482795 1.999771 0.500045 29.014442 0.500093 0.816969 0.500056
min 1.000000 98.000000 0.000000 1.000000 0.000000 -1.000000 0.000000
25% 644.500000 99.000000 0.000000 26.000000 0.000000 -1.000000 0.000000
50% 1288.000000 101.000000 0.000000 50.000000 1.000000 0.000000 0.000000
75% 1931.500000 103.000000 1.000000 76.500000 1.000000 1.000000 1.000000
max 2575.000000 104.000000 1.000000 100.000000 1.000000 1.000000 1.000000
In [39]:
covid.columns
Out[39]:
Index(['S.No', 'Fever', 'BodyPain', 'Age', 'RunnyNose', 'DiffBreath',
       'InfectionProb'],
      dtype='object')

Finding Missing Value¶

In [40]:
plot_missing(covid)
  0%|          | 0/118 [00:00<?, ?it/s]
Out[40]:
DataPrep.EDA Report

Missing Statistics

Missing Cells0
Missing Cells (%)0.0%
Missing Columns0
Missing Rows0
Avg Missing Cells per Column0.0
Avg Missing Cells per Row0.0
'height': 500
Height of the plot
'width': 500
Width of the plot
'spectrum.bins': 20
Number of bins
'height': 500
Height of the plot
'width': 500
Width of the plot
'height': 500
Height of the plot
'width': 500
Width of the plot
'height': 500
Height of the plot
'width': 500
Width of the plot
In [41]:
# create a table with data missing 
missing_values=covid.isnull().sum() # missing values

percent_missing = covid.isnull().sum()/covid.shape[0]*100 # missing value %

value = {
    'missing_values ':missing_values,
    'percent_missing %':percent_missing  
}
frame=pd.DataFrame(value)
frame
Out[41]:
missing_values percent_missing %
S.No 0 0.0
Fever 0 0.0
BodyPain 0 0.0
Age 0 0.0
RunnyNose 0 0.0
DiffBreath 0 0.0
InfectionProb 0 0.0

Data Vizualisation¶

Infection Probability (Target)¶

In [42]:
sns.countplot(x='InfectionProb',data=covid)
Out[42]:
<AxesSubplot:xlabel='InfectionProb', ylabel='count'>

Fever Problem¶

In [43]:
sns.countplot(x='Fever', data=covid)
Out[43]:
<AxesSubplot:xlabel='Fever', ylabel='count'>
In [44]:
covid["Fever"].value_counts().plot.pie(explode=[0.1,0.1,0.1,0.1,0.1,0.1,0.1],autopct='%1.1f%%',shadow=True)
plt.title('Number of Cases');

Body Pain Chick List Problem¶

In [45]:
sns.countplot(x='BodyPain', hue='InfectionProb', data=covid)
Out[45]:
<AxesSubplot:xlabel='BodyPain', ylabel='count'>
In [46]:
covid["BodyPain"].value_counts().plot.pie(explode=[0.1,0.1],autopct='%1.1f%%',shadow=True)
plt.title('Number of Cases')
Out[46]:
Text(0.5, 1.0, 'Number of Cases')

Age with Infection Probability Check List¶

In [47]:
sns.countplot(x='Age', hue='InfectionProb', data=covid)
Out[47]:
<AxesSubplot:xlabel='Age', ylabel='count'>

Runny Nose with Infection Probability Checklist Prob¶

In [48]:
sns.countplot(x='RunnyNose', hue='InfectionProb', data=covid)
Out[48]:
<AxesSubplot:xlabel='RunnyNose', ylabel='count'>
In [49]:
covid["RunnyNose"].value_counts().plot.pie(explode=[0.1,0.1],autopct='%1.1f%%',shadow=True)
plt.title('Number of Cases');

Diff Breath With Infection Probability Prob¶

In [50]:
sns.countplot(x='DiffBreath', hue='InfectionProb', data=covid)
Out[50]:
<AxesSubplot:xlabel='DiffBreath', ylabel='count'>
In [51]:
covid["DiffBreath"].value_counts().plot.pie(explode=[0.1,0.1,0.1],autopct='%1.1f%%',shadow=True)
plt.title('Number of Cases');

Feature Transformation¶

In [52]:
from sklearn.preprocessing import LabelEncoder
e=LabelEncoder()
In [53]:
covid['Fever']=e.fit_transform(covid['Fever'])
covid['BodyPain']=e.fit_transform(covid['BodyPain'])
covid['Age']=e.fit_transform(covid['Age'])
covid['RunnyNose']=e.fit_transform(covid['RunnyNose'])
covid['DiffBreath']=e.fit_transform(covid['DiffBreath'])
covid['InfectionProb']=e.fit_transform(covid['InfectionProb'])
In [54]:
covid.head()
Out[54]:
S.No Fever BodyPain Age RunnyNose DiffBreath InfectionProb
0 1 4 0 8 0 0 0
1 2 4 0 9 0 1 1
2 3 6 0 32 1 0 0
3 4 3 1 58 0 2 0
4 5 1 0 97 0 1 0
In [55]:
covid.dtypes.value_counts()
Out[55]:
int64    7
dtype: int64

Info About Our Data After Transformation¶

In [56]:
covid.describe(include='all')
Out[56]:
S.No Fever BodyPain Age RunnyNose DiffBreath InfectionProb
count 2575.000000 2575.000000 2575.000000 2575.000000 2575.000000 2575.000000 2575.000000
mean 1288.000000 2.969709 0.492816 50.023301 0.502136 1.002330 0.493592
std 743.482795 1.999771 0.500045 29.014442 0.500093 0.816969 0.500056
min 1.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000
25% 644.500000 1.000000 0.000000 25.000000 0.000000 0.000000 0.000000
50% 1288.000000 3.000000 0.000000 49.000000 1.000000 1.000000 0.000000
75% 1931.500000 5.000000 1.000000 75.500000 1.000000 2.000000 1.000000
max 2575.000000 6.000000 1.000000 99.000000 1.000000 2.000000 1.000000
In [57]:
covid.hist(figsize=(15,10));

Correlation Betwenn Features¶

In [58]:
plot_correlation(covid)
Out[58]:
DataPrep.EDA Report
Pearson Spearman KendallTau
Highest Positive Correlation 0.048 0.048 0.045
Highest Negative Correlation -0.036 -0.036 -0.036
Lowest Correlation 0.0 0.0 0.0
Mean Correlation -0.002 -0.002 -0.003
'height': 400
Height of the plot
'width': 400
Width of the plot
  • Most positive correlated: (RunnyNose, DiffBreath)
  • Most negative correlated: (BodyPain, RunnyNose)
  • Least correlated: (Fever, DiffBreath)
'height': 400
Height of the plot
'width': 400
Width of the plot
  • Most positive correlated: (RunnyNose, DiffBreath)
  • Most negative correlated: (BodyPain, RunnyNose)
  • Least correlated: (Fever, DiffBreath)
'height': 400
Height of the plot
'width': 400
Width of the plot
  • Most positive correlated: (RunnyNose, DiffBreath)
  • Most negative correlated: (BodyPain, RunnyNose)
  • Least correlated: (Fever, DiffBreath)
In [59]:
corr=covid.corr()
corr.style.background_gradient(cmap='coolwarm',axis=None)
Out[59]:
  S.No Fever BodyPain Age RunnyNose DiffBreath InfectionProb
S.No 1.000000 0.033867 -0.030738 0.008658 0.016956 -0.015748 0.030006
Fever 0.033867 1.000000 -0.013039 -0.021883 0.011719 0.000281 -0.007187
BodyPain -0.030738 -0.013039 1.000000 -0.012199 -0.036059 0.005747 -0.031646
Age 0.008658 -0.021883 -0.012199 1.000000 0.029824 0.006668 -0.035496
RunnyNose 0.016956 0.011719 -0.036059 0.029824 1.000000 0.047533 -0.017423
DiffBreath -0.015748 0.000281 0.005747 0.006668 0.047533 1.000000 -0.027542
InfectionProb 0.030006 -0.007187 -0.031646 -0.035496 -0.017423 -0.027542 1.000000

Machine Learning Algo¶

In [60]:
from sklearn.model_selection import train_test_split
from sklearn import metrics
from sklearn.metrics import accuracy_score
from sklearn.preprocessing import StandardScaler
In [61]:
x=covid.drop('InfectionProb',axis=1)
y=covid['InfectionProb']
In [62]:
x_train, x_test, y_train, y_test = train_test_split(x, y, test_size = 0.20)

SVM Support Vector Machine¶

In [63]:
from sklearn import svm
#Create a svm Classifier
clf = svm.SVC(kernel='linear') # Linear Kernel
#Train the model using the training sets
clf.fit(x_train, y_train)
#Predict the response for test dataset
y_pred = clf.predict(x_test)
#Score/Accuracy
from sklearn.metrics import confusion_matrix, classification_report
print(confusion_matrix(y_test, y_pred))
print(classification_report(y_test, y_pred))
[[120 138]
 [124 133]]
              precision    recall  f1-score   support

           0       0.49      0.47      0.48       258
           1       0.49      0.52      0.50       257

    accuracy                           0.49       515
   macro avg       0.49      0.49      0.49       515
weighted avg       0.49      0.49      0.49       515

In [64]:
#Import svm model
from sklearn import svm
#Create a svm Classifier
clf = svm.SVC(kernel='linear') # Linear Kernel
#Train the model using the training sets
clf.fit(x_train, y_train)
#Predict the response for test dataset
y_pred = clf.predict(x_test)
#Score/Accuracy
acc_svc=clf.score(x_test, y_test)*100
acc_svc
Out[64]:
49.12621359223301

Logistic Regression¶

In [65]:
from sklearn.linear_model import LogisticRegression
model = LogisticRegression()
#Fit the model
model.fit(x_train, y_train)
y_pred = model.predict(x_test)
#Score/Accuracy
acc_logreg=model.score(x_test, y_test)*100
acc_logreg
Out[65]:
48.932038834951456

K_Neighbors_Classifier¶

In [66]:
from sklearn.neighbors import KNeighborsClassifier
knn = KNeighborsClassifier(n_neighbors=20)
knn.fit(x_train, y_train)
y_pred = knn.predict(x_test)
#Score/Accuracy
acc_knn=knn.score(x_test, y_test)*100
acc_knn
C:\Users\GC\anaconda3\lib\site-packages\sklearn\neighbors\_classification.py:228: FutureWarning: Unlike other reduction functions (e.g. `skew`, `kurtosis`), the default behavior of `mode` typically preserves the axis it acts along. In SciPy 1.11.0, this behavior will change: the default value of `keepdims` will become False, the `axis` over which the statistic is taken will be eliminated, and the value None will no longer be accepted. Set `keepdims` to True or False to avoid this warning.
  mode, _ = stats.mode(_y[neigh_ind, k], axis=1)
C:\Users\GC\anaconda3\lib\site-packages\sklearn\neighbors\_classification.py:228: FutureWarning: Unlike other reduction functions (e.g. `skew`, `kurtosis`), the default behavior of `mode` typically preserves the axis it acts along. In SciPy 1.11.0, this behavior will change: the default value of `keepdims` will become False, the `axis` over which the statistic is taken will be eliminated, and the value None will no longer be accepted. Set `keepdims` to True or False to avoid this warning.
  mode, _ = stats.mode(_y[neigh_ind, k], axis=1)
Out[66]:
48.932038834951456

DecisionTreeClassifier¶

In [67]:
from sklearn import tree
t = tree.DecisionTreeClassifier()
t.fit(x_train,y_train)
y_pred = t.predict(x_test)
#Score/Accuracy
acc_decisiontree=t.score(x_test, y_test)*100
acc_decisiontree
Out[67]:
53.398058252427184

Naive_Bayes¶

In [68]:
from sklearn.naive_bayes import GaussianNB
model = GaussianNB()
model.fit(x_train,y_train)
#Score/Accuracy
acc_gaussian= model.score(x_test, y_test)*100
acc_gaussian
Out[68]:
48.349514563106794

Score DataFrame¶

In [69]:
models = pd.DataFrame({
    'Model': ['Support Vector Machines', 'KNN', 'Logistic Regression', 
               'Naive Bayes',   
              'Decision Tree'],
    'Score': [acc_svc, acc_knn, acc_logreg,  acc_gaussian, acc_decisiontree]})
models.sort_values(by='Score', ascending=False)
Out[69]:
Model Score
4 Decision Tree 53.398058
0 Support Vector Machines 49.126214
1 KNN 48.932039
2 Logistic Regression 48.932039
3 Naive Bayes 48.349515
In [ ]: